

# **NBTI-aware Skew Minimization Techniques**

Ram Rakesh Department Of Electronics and Comm. Engg, Govt. Polytechnic, Hisar (Haryana), INDIA ramrakesh37@gmail.com

**Abstract:** Clock distribution remains an important challenge in design of large scale digital chips, a challenge that has grown with technology scaling. Most modern clock distributions are single wire. Meaning that there is a single clock phase is distributed globally although multiple clock phases are of created locally with clock shapers or clock buffer circuits. Two phase clocking scheme is always important for designing high performance system. This scheme is motivated from low power point of view. C<sup>2</sup>MOS latches are the good carriers for low power. For low power and high speed applications we can use Non over lapping two phase clocking with pipeline technique of two blocks which is combination of latches and flip flop. Two phase clocking signals avoid circuit delay for correct operation of the system. Larger digital systems with higher clock rate having sufficient skew margin motivates us to design circuit by arranging the circuit blocks in such a way that we can have control over the degradation provided by NBTI. The best solution is to arrange the blocks in pipelined fashion. Higher speed and lower power consumption are the main advantages of pipe lining. Taking the impact from NBTI and process variations into account, as seen from a novel flow for reducing clock skew by selectively replacing standard Vth clock buffers for replacement. The area overhead here is negligible, and the power consumption is reduced as well.

Keywords: NBTI, Skew, Two Phase Clocking, Pipelining.

#### Introduction

Clock signal is defines relative time reference for the movement of data. The most common practice for clock distribution is to insert a number of buffers in between clock source and flip flops. The clock skew is the maximum time difference among the clock paths.

Since the clock skew is having the negative impact on synchronous sequential circuits for the performance point of view, so minimization of skew becomes important task.

On the other hand, as device shrinks in dimension into deep sub micron ranges, the negative bias temperature instability (NBTI) becomes an important concern for long term reliability of circuits.

In PMOS gate oxide electric field accelerates NBTI effect largely which is very sensitive with increase in gate oxide electric field, vertical electric field increases and it speed up hydrogen ions and makes them trapped to form a fixed oxide positive charge. This will induced higher threshold voltage shift resulting in more and more NBTI effect.

The generation of traps and oxide defects is correlated that depends on release of hydrogen species at Si/Sio<sub>2</sub> interface, according to hydrogen release model.



Fig: 1 Schematic representation of the reaction-diffusion (R-D) model.



As a result of above delay characteristics of PMOS transistor will change over time. Delay degradation(induced by NBTI) of a logic gate is linearly proportion to the number of output activities.

Due to clock gating design, a clock gate often has different active probabilities, which leads to different delay degradations. Note that difference in delay degradations of a clock path results in **additional clock skew called aging skew**. Obviously, if the aging skew is too large, the circuit will fail to function at some time later. Thus there is great demand of properly control the aging skew.

#### Motivation

Clock distribution remains an important challenge in design of large scale digital chips, a challenge that has grown with technology scaling. Most modern clock distributions are single wire. Meaning that there is a single clock phase is distributed globally although multiple clock phases are of created locally with clock shapers or clock buffer circuits.

The challenge of clock distribution is to distribute the clock simultaneously everywhere (no skew) and periodically everywhere (no jitter) using minimum power and wiring resources providing minimum degradation to circuits.

A two phase clocking distribution for digital ICs can be grantee free from clock skew, critical race and other timing problems. It is consisting of two main parts, Notation for signal types and Composition rule legal circuits

Infact the notation and rule dense syntax of clocking correct circuit that can be checked by simulation tools.

#### **Assumptions:**

Clocking disciple rests on three basic assumptions.

- 1. All inputs and initial values are digital
- 2. The system is of two phase and synchronous

3. All logic and wiring delays are positive and bounded. No knowledge of relative delay among circuit path is needed.

Two phase clocking system is very practical method for designing most MOS ICs. It can be shown that two phases are needed to avoid critical races if relative delay between different paths are unknown. Although NBTI impact on functional circuits has been extensively studied and so many solution were proposed but the effect of NBTI on clock trees not fully examined yet.

## **Related Work:**

Process variation is another concern to increasing performance of digital ICs. Rodgors et al. inserted multiple types of gates into clock tree to gate the clock paths. Which defined as clock signal path from source clock to sink D Flip Flop. Here Clock skew remained unsolved due to number of gates and buffers used. Similarly process variations cannot be avoided. Borkovie et.al added extra circuitry to simplify clock control logic in order to reduce skew. Again their approach was not suitable for minimization of power and the scheme was not applicable to all implementations. More importantly is that process variations are not considered in their approach. Nardy et.al considered process variations for minimization of skew in deep micron technology. Their results show that there is significant effect of process variations providing

Volume-7 • Number-1 Jan -June 2015 pp. 37-46

A few among the above aim towards aging effect a dominant source of inducing clock skew specially in smaller technology nodes. Cohn (J.M.) et.al proposed a solution to reduce clock skew due to NBTI. Here clock signal on particular path was gaurdbanded with a safety margin. Here safety margin set exactly at half of the skew degradation. Although process variations were considered but having limited scope due to safety margin. Chakraborty et.al tries to equalize signal probability of all clock paths according to run time operation and could achieve significant reduction of skew produced by NBTI.

Then they proposed optimization algorithm to select NAND and NOR gate at output stage of integrated and clock gating (ICG) cells. This enabled the control of ICG output signal to remain at '0' or '1'. Thus balancing the degradation rate of clock paths and minimize NBTI included clock skew.

## **NBTI Effect:**

JJEE

NBTI effect is the result of generation of interference charge at  $Si/Sio_2$  the process is analytically interacted using Reaction-Diffusion (R-D) Model.As NBTI is highly dependent on biasing condition. A forward bias (Vgs=-Vdd) where Vdd is supply voltage increases NBTI effect leads to increment in PMOS threshold voltage. Reverse bias (Vgs=0) recovers some of NBTI effect and decrease threshold voltage. The corresponding long term model is formulated as:

$$\Delta V_{th}(t) = \left(\frac{\sqrt{K_v \alpha T_{clk}}}{1 - \beta_t^{1/2\phi}}\right)^{2\phi} \tag{1}$$

$$\beta_{t} = 1 - \frac{2\epsilon_{1}t_{c} + \sqrt{\epsilon_{2}c(1-\alpha)T_{CLK}}}{2t_{ox} + \sqrt{c_{t}}}$$

$$\tag{2}$$

$$k_{v} = \left(\frac{qt_{ox}}{\epsilon_{ox}}\right)^{3} k^{2} c_{ox(Vgs-Vth)\sqrt{c}} exp\left(\frac{2E_{ox}}{E_{0}}\right)$$
(3)

Where  $\alpha$  is duty ratio, T<sub>clk</sub> is clock period and other parameters are physical related constants.

Here equation (1) demonstrates that  $|\Delta V_{th}(t)|$  exponentially depends on initial threshold voltage V<sub>th</sub>

From this equitation it is clear that low  $V_{th}$  PMOS transistor has faster degradation rate and thus  $\Delta V_{th}(t)$  increases.

Alfa-power law provide an analytical model between  $V_{th}$ ,  $\Delta V_{th}(t)$  and gate delay  $D_i$  as

$$D_{i} = \frac{C_{i} V_{dd}}{\beta_{i} \left[ V_{dd} - (V_{th} + \Delta V_{th}(t)) \right]^{r}}$$

$$\tag{4}$$

Where  $C_i$  is the effective capacitive load connected to gate i and  $\beta_i$  depending upon gate sizing. Jifeng chen et.al used minimum sized buffer and illustrated that buffer with lower initial Vth tends to degrade faster while those with higher initial Vth have slower degradation.

#### Why two phase clocking scheme

Two phase clocking scheme is always important for designing high performance system. This scheme is motivated from low power point of view.  $C^2MOS$  latches are the good carriers for low power. For low power and high speed applications we can use Non over lapping two phase clocking with pipeline technique of two blocks which is combination of latches and flip flop. Two phase clocking signals avoid circuit delay for correct operation of the system.

To design using single phase clocking scheme there are number of tight constraints that have to meet with respect to delay in circuit clock period and clock width for correct operation of circuit. When number of chips are put together

to form total system, clock skew becomes serious problem. Even with single large chip or complex chip the problem of skew should be analyzed thoroughly to avoid faulty operation of circuit. Dynamic schemes such as NORA (No operation) using true two phase clock  $\Phi_1$  and  $\Phi_2$  have been used to avoid race problems caused by clock skew.

Magnus Karlsson et.al provide design margin for eliminating race problem caused by clock skew



Fig.2 Non overlapping two phase wave

Clock of C<sup>2</sup>MOS latch requires the inverse of the two clock phase  $\Phi_1$  and  $\Phi_2$ . The inverse of clock can locally generated by two small inverters and globally distributed over larger area using four wires in the chip i.e. pseudo two phase clock.

## Two Phase clock generator

IJEE

The block diagram used by sjostrom et.al is shown in fig.3. Here two non-overlapping clock phases  $\Phi_1$  and  $\Phi_2$  are designed from single phase clock  $\Phi$  running at double clock rate. A divide by two circuit haves the clock rate and thus we have two internal clock signals  $2\Phi$  and  $\overline{2\Phi}$ . Hence two clock phases  $\Phi_1$  and  $\Phi_2$  are obtained with the help of AND gates.





Resetting of Div 2 circuit is required for alignment of  $\Phi_1$  and  $\Phi_2$  with main clock  $\Phi$ . The resulting two phase clocks  $\Phi_1$  and  $\Phi_2$  have duty cycle of 25% of clock period. It will provide skew margin of 25% of clock period. But there is penalty of speed degradation due to shorter evolution phase.

## Timing pipelined system

Larger digital systems with higher clock rates having sufficient skew margin motivates us to arrange the circuit blocks in such a way so that we can have control over the degradation induced by NBTI.

Higher speed and lower power consumption are the main advantages of pipe lining



## Fig: 4 pipe lining

## Important formulae

IJEE

The propagation delay  $T_d$  of CMOS circuit can be calculated by relation

$$T_{d=} = \frac{c_{charge} V_0}{K (V_0 - V_t)^2}$$
(5)

 $V_t$  is the threshold voltage depending upon NBTI,  $C_{charge}$  is the capacitance to be charged or discharge in a single clock cycle.

Where as power consumption in CMOS circuit can be calculated by relation

$$P_{CMOS} = C_{TOTAL} V_0^2 f \tag{6}$$

The power consumption in original sequential circuit is given by

$$P_{Seq} = C_{TOTAL} \cdot V_0^2 \cdot f$$
 ,  $f = \frac{1}{T_{Seq}}$ 

Here  $T_{seq}$  is the clock period of sequential circuit.

From level pipelined system, we can see that its control path is reluctant to 1/M of original length and the capacitance to be charged/discharged in a single clock cycle is also reduced to 1/M of its original capacitance. If the same clock speed (clock freq; f) is to be maintained only a fraction(1/M) of the original capacitance is charged/discharged in the same amount of of the time.

This implies that supply voltage can be reduced to  $\beta V_0$  (0 <  $\beta$  < 1),

Hence the power consumption of the pipelined circuit is

$$P_{PIP} = C_{TOTAL} \beta^2 V_0 f = \beta P_{Seq}$$

Propagation delays of sequential circuit can be compiled by relation

JEE

(7)

$$T_{PIP} = \frac{(\frac{C_{charge}}{M})\beta V_0}{K(\beta V_0 - V_t)^2}$$

Larger digital systems with higher clock rate having sufficient skew margin motivates us to design circuit by arranging the circuit blocks in such a way that we can have control over the degradation provided by NBTI. The best solution is to arrange the blocks in pipelined fashion.

Pipelining is a key element for high-performance design and is a straightforward technique for synchronous systems. Complex function blocks are subdivided into smaller blocks, registers are inserted to separate them, and the global clock is applied to all registers. In digital sequential circuit synchronization is used in the vast majority. The sequential circuit included all resistors, flip-flop, latches and memory elements, In this study Flip-flop and latch have been implemented as a combination of one block, two blocks & many blocks. This work is related to low power and high speed flip-flop and latch combination will performing the same operation. As flip-flop is edge trigger is faster in speed and latch is labelled triggered it act as low power device. Latch-based designs have small die size and are more successful in high-speed designs.



Fig.5 Schematic diagram of pipelining

For low power operation, CMOS chosen instead of NMOS. CMOS logic dissipates less power than NMOS logic circuits because CMOS dissipates power only when switching ("dynamic power"). On a typical ASIC in a modern 90 nanometer process, switching the output might take 120 picoseconds, and happens once every ten nanoseconds CMOS switches have a single-pin control interface that enables maximum circuit layout efficiency. Here two phase clocking technology is implemented with Clocked CMOS Which prevents from Glitches, unwanted hazard. The implementation of NORA dynamic CMOS technique uses a true nonoverlapping two-phase-clock Signal  $\emptyset$  and  $\emptyset'$ , and can avoid race problems caused by clock skew. NORA dynamic CMOS technique can provide higher clock rates than the C<sup>2</sup>MOS technique as there is very negligible dead time and no skew problem.

The Pipeline is a technique used in the design to increase their instruction throughput that is the number of instructions that can be executed in a unit of time. Inverting a single clock can lead to skew problems. Employ two



non-overlapping clocks for master and slave sections of flip flops also use two phases for alternating pipeline stages. High-performance digital system design is the use of pipelining



Fig: 6 Block diagram for Pipeline section



Fig:7 RTL of complete pipeline block diagram



Fig.8 Simulation wave form

## NBTI Aware Clocking Algorithm

Taking the impact from NBTI and process variations into account, as seen from a novel flow for reducing clock skew by selectively replacing standard Vth clock buffers with their high Vth counterparts. An extended "divide and



conquer" algorithm is developed to identify the critical clock buffers for replacement. The area overhead here is negligible, and the power consumption is reduced as well.

The PTM model can be used to characterize the delay and degradation of the buffers in the standard cell library considering different Vth values. Thus, a multi Vth clock buffer library can be characterized. In the physical design stage, clock tree structure can be obtained. Other NBTI-related parametric information can be collected using either commercial or our in-house tools. Through an aging-aware timing analysis using the multi Vth clock buffer library, the delay and degradation for the buffers and the clock paths are derived.

Assuming that the comprehensive timing analysis suggests that the worst delay value will become  $\Gamma$  at time *t*, a safety margin  $\Delta\Gamma$  has to be assigned to the circuit at design time.

Combined with the above analysis, the following constraint must be satisfied for a new replacement so as not to violate the design specification:

$$\mathcal{R} = \begin{cases} 1 & \tau m(0) \leq \Gamma - \Delta \Gamma - \Delta \tau, \\ 0 & otherwise \end{cases}$$
(8)

where  $\tau m(0)$  is the path delay at time 0 under nominal condition without process variations and NBTI impact. Here, we use the  $\Re$  value as a flag to indicate whether a replacement

is feasible when  $\Re = 1$ . we can see that a new buffer replacement is optimal in reducing the clock skew only if the following requirements are satisfied:

$$\begin{cases} \tau after(0) \leq \Gamma - \Delta \Gamma - \Delta \tau \\ SKafter < SKinit \end{cases}$$
(9)

where *SKinit* (*SKafter*) is the clock skew before (after) buffer replacement;  $\tau after(0)$  is the clock path delay at time 0 after replacement.

## **Clock Buffer Replacement Algorithm**

- 1: Extract X clock paths with L buffers from the clock tree
- 2: Obtain STA information of the X clock paths
- 3: Collect aging information of the X clock paths
- 4: Specify  $\Delta \tau$  considering NBTI effect and process variations
- 5: Specify stress time t for NBTI aging analysis
- 6: Separate all the clock paths into N sets through buffer overlap analysis
- 7: for (All sets n from 1 to N) do
- 8: for All paths m from 1 to M in current set do
- 9: Select a clock buffer k from 1 to K on current path m
- 10: if Buffer k is a non-overlap buffer by all the paths in set m then



- 11: Evaluate the replacement using equation(9) in current set
- 12: if it is an optimal replacement then
- 13: Replace the current buffer
- 14: end if
- 15: Update the timing information of the paths in current set n
- 16: else if Buffer k is an overlapped buffer by all the paths in set n then
- 17: Evaluate the optimality using Equation (9)
- 18: if The replacement decreases the clock skew among buffer sets then
- 19: Replace the current buffer
- 20: Update the timing information of current buffer set
- 21: end if
- 22: end if
- 23: end for
- 24: end for

#### Conclusion

Here two phase clocking technology is implemented with Clocked CMOS Which prevents from Glitches, unwanted hazard. The implementation of NORA dynamic CMOS technique uses a true nonoverlapping two-phase-clock Signal  $\emptyset$  and  $\emptyset'$ , and can avoid race problems caused by clock skew. NORA dynamic CMOS technique can provide higher clock rates than the C<sup>2</sup>MOS technique as there is very negligible dead time and no skew problem. In digital sequential circuit synchronization is used in the vast majority. The sequential circuit included all resistors, flip-flop, latches and memory elements, in this study Flip-flop and latch have been implemented as a combination of one block, two blocks & many blocks. Pipelining is a key element for high-performance design and is a straightforward technique for synchronous systems. Complex function blocks are subdivided into smaller blocks, registers are inserted to separate them, and the global clock is applied to all registers. The schematic diagram of pipeline is shown in Fig.6 and the simulation waveforms are shown in Fig.8

In this paper true single phase clocking is modified to two phase clocking by implementing pipeline section, which results non-overlapping clock signal, avoid circuit delays, clock period, and clock width for the correct operation of the circuit. TPCS also avoid multi-stepping or race condition in the circuit. By using two phase clock signal the circuit will be clocked circuit or synchronized circuit. Pipeline technique enhances the two phase system. The application of flip-flop provides high performance and faster speed.

Finally an algorithm is purposed that uses margin to reduce the skew by identifying the critical clock buffers in the clock tree. These identified buffers are replaced by their high *Vth* counterparts.



#### **References:**

- [1] Jifeng Chen, and Mohammad Tehranipoor ``A Novel Flow for Reducing Clock Skew Considering NBTI Effect and Process Variations`` Quality Electronic Design (ISQED) 2013
- [2] ZHANG Xue-min, Wang We-dong ``A two-phase non-overlapping clock generator with adjustable duty cycle, journal of circuits and system. 2013
- [3] Zhao Lei, Yang Yintang, Zhu Zhangming, and Liu Lianxi, February 2013 ``A clock generator for a high-speed high-resolution pipelined A/D converter`` Journal of Semiconductors.
- [4] S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula, 2006 "Predictive Modeling of the NBTI Effect of Reliable Design," in theIEEE Custom Integrated Circuits Conference(ICCAD)
- [5] X.Wei.Y.Cai and H. Kong, March 2006 ``Clock Skew Under process variations`` The Proceedings of the Internatinal Symposium on Quality Electronic Design (ISQED), PP 237-242.
- [6] Ashutosh Chakraborty and David and David Z. Pan 2010 `` Skew Management of NBTI Impacted Gated Clock Trees ``\_International Symposium on Physical Design.
- [7] Hamed Abrishami, Safar Hatami, Behnam Amelifard, Massoud Pedram 2008 `` NBTI-Aware Flip-Flop Characterization and Design``, GLSVLSI.
- [8] Ram Rakesh, K.S. Yadav, Jaipal REDUCTION OF LEAKAGE BY INPUT VECTORS WITH CONSTRAINED NBTI DEGRADATION International Journal of Information Technology and Knowledge Management pp163-167, 2011